-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removing Operator specific handling during a StudyJob run #387
Conversation
/assign @richardsliu @hougangliu |
var state katibapi.State = katibapi.State_RUNNING | ||
var cpTime *metav1.Time | ||
switch w.Kind { | ||
switch wkind.Kind { | ||
|
||
case DefaultJobWorker: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this case still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But the string values are different currently. 'Complete' in https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/api/batch/v1/types.go#L170 vs 'Succeeded' in https://github.com/kubeflow/tf-operator/blob/master/pkg/apis/common/v1beta2/common_types.go#L121
In general, since 'job' is a built-in K8s Kind, should we care about it matching with our JobCondition? I feel that the only necessary condition is that all custom job operators share the same JobStatus type.
Or other option is, we can change the common JobConditionType string to match the batch Job's JobConditionType. But new developers might not be aware of this assumption and may break the logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ok for this version. One thing that we have discussed in the past is moving the common types into its own repo. If we go down that route, it will be easier to change then. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. There is a discussion in kubeflow/mpi-operator#92 (comment)
if err := r.Client.Get(context.TODO(), nname, &job); err != nil { | ||
log.Printf("Client Get error %v for %v", err, nname) | ||
return WorkerStatus{} | ||
} | ||
if job.Status.Active == 0 && job.Status.Succeeded > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we should be able to use the similar logic here to check for job completion: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/api/batch/v1/generated.proto#L158
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see above comment
/lgtm |
jobStatus := commonv1beta1.JobStatus{} | ||
err := runtime.DefaultUnstructuredConverter.FromUnstructured(statusMap, &jobStatus) | ||
if err != nil { | ||
log.Printf("Error in converting unstructured to status:%v ", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/status:%v/status: %v
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return WorkerStatus{}
/lgtm |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hougangliu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
During StudyJob run, there are no more operator specific code. The unstructured is used to create the resource, get the resource and get the status
The only remaining is the watch changes that has to be separately handled in #317. Current controller runtime code(0.1.7 and less) doesn't support dynamic watch. Refer: kubernetes-sigs/kubebuilder#422
Related: #341
This change is